{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1. 字符串的创建与驻留机制"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Python 2352475873832\n",
      "Python 2352475873832\n",
      "Python 2352475873832\n",
      "False\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'\\n字符串驻留机制的优缺点：\\n当需要相同的字符串时可以直接从字符串池中拿来使用，避免频繁的创建和销毁\\n在需要进行字符串拼接时建议使用str类型的join方法，而非+，因为join()方法是先计算出所有字符中的长度，然后再拷贝，只new一次对象，效率要比+效率高\\n'"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 字符串的驻留机制\n",
    "a = 'Python'\n",
    "b = \"Python\"\n",
    "c = '''Python'''\n",
    "print(a, id(a))\n",
    "print(b, id(b))\n",
    "print(c, id(c))  # 相同，三个对象在内存中只有一份\n",
    "\n",
    "\"\"\"\n",
    "字符串的驻留机制的几种情况（交互模式）\n",
    "1.字符串的长度为0或1\n",
    "2.符合标识符的字符串\n",
    "3.字符串只在编译时进行驻留，而非运行时\n",
    "4.[-5,256]之间的整数数字\n",
    "\"\"\"\n",
    "\n",
    "# Pycharm对字符串进行了优化处理，需要用python对话框才能看出来\n",
    "s1 = 'abc%'\n",
    "s2 = 'abc%'\n",
    "print(s1 is s2)\n",
    "\n",
    "\"\"\"\n",
    "字符串驻留机制的优缺点：\n",
    "当需要相同的字符串时可以直接从字符串池中拿来使用，避免频繁的创建和销毁\n",
    "在需要进行字符串拼接时建议使用str类型的join方法，而非+，因为join()方法是先计算出所有字符中的长度，然后再拷贝，只new一次对象，效率要比+效率高\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2. 字符串的查询操作"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3\n",
      "3\n",
      "9\n",
      "9\n",
      "-1\n",
      "-1\n"
     ]
    }
   ],
   "source": [
    "# 字符串的查询操作\n",
    "s = 'hello,hello'\n",
    "print(s.index('lo'))\n",
    "print(s.find('lo'))\n",
    "print(s.rindex('lo'))\n",
    "print(s.rfind('lo'))\n",
    "\n",
    "# print(s.index('k'))  # ValueError: substring not found\n",
    "print(s.find('k'))  # -1  不抛出异常\n",
    "# print(s.rindex('k'))  # ValueError: substring not found\n",
    "print(s.rfind('k'))  # -1  不抛出异常"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3. 字符串大小写转换"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "HELLO,PYTHON 2352519751408\n",
      "hello,python 2352519970608\n",
      "hello,python 2352519969648\n",
      "True\n",
      "False\n",
      "HELLO,pYTHON\n",
      "Hello,Python\n"
     ]
    }
   ],
   "source": [
    "# 字符串的大小写转换的方法\n",
    "s = 'hello,python'\n",
    "a = s.upper()\n",
    "b = s.lower()\n",
    "print(a, id(a))  # 内存地址改变，转成大写后会产生一个新的字符串对象\n",
    "print(s, id(s))\n",
    "print(b, id(b))  # 同为小写，转换后也会产生新的字符串对象\n",
    "print(b == s)\n",
    "print(b is s)  # False\n",
    "\n",
    "s2 = 'hello,Python'\n",
    "print(s2.swapcase())\n",
    "\n",
    "print(s2.title())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 4. 字符串内容对齐操作"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "****hello,Python****\n",
      "hello,Python\n",
      "hello,Python\n",
      "hello,Python        \n",
      "********hello,Python\n",
      "        hello,Python\n",
      "hello,Python\n",
      "00000000hello,Python\n",
      "hello,Python\n",
      "-0008910\n"
     ]
    }
   ],
   "source": [
    "s = 'hello,Python'\n",
    "\n",
    "print(s.center(20, '*'))  # 居中对齐，第一个参数是宽度，第二个参数是填充符，默认为空格\n",
    "\n",
    "print(s.ljust(10, '*'))  # 左对齐\n",
    "print(s.ljust(10))  # 宽度小于字符串，返回原字符串\n",
    "print(s.ljust(20))  # 填充符默认为空格\n",
    "\n",
    "print(s.rjust(20, '*'))  # 右对齐\n",
    "print(s.rjust(20))  # 填充符默认为空格\n",
    "print(s.rjust(10))  # 宽度小于字符串，返回原字符串\n",
    "\n",
    "# 右对齐，左边用0填充，只有一个参数\n",
    "print(s.zfill(20))\n",
    "print(s.zfill(10))  # 宽度小于字符串，返回原字符串\n",
    "print('-8910'.zfill(8))  # 0填在负号之后"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5. 字符串劈分操作"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['hello', 'world', 'Python']\n",
      "['hello', 'world', 'Python']\n",
      "['hello', 'world|Python']\n",
      "['hello', 'world', 'Python']\n",
      "['hello', 'world', 'Python']\n",
      "['hello|world', 'Python']\n"
     ]
    }
   ],
   "source": [
    "# 字符串的劈分操作，分割操作\n",
    "s = 'hello world Python'\n",
    "lst = s.split()  # 默认分隔符是空格，返回为列表类型\n",
    "print(lst)\n",
    "\n",
    "# 通过参数sep指定分隔符\n",
    "s1 = 'hello|world|Python'\n",
    "print(s1.split(sep='|'))\n",
    "\n",
    "# 通过参数maxsplit指定分隔符的最大分割次数，经过最大分割次数后，其余的子串单独作为一部分\n",
    "print(s1.split(sep='|', maxsplit=1))\n",
    "\n",
    "# 从右侧开始劈分rsplit\n",
    "print(s.rsplit())\n",
    "print(s1.rsplit(sep='|'))\n",
    "print(s1.rsplit(sep='|', maxsplit=1))  # 指定了最大分割次数后，左右分割就不一样了"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 6. 字符串的判断"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1. False\n",
      "2. True\n",
      "3. True\n",
      "4. True\n",
      "5. True\n",
      "6. True\n",
      "7. True\n",
      "8. False\n",
      "9. True\n",
      "10. False\n",
      "11. False\n",
      "12. True\n",
      "13. True\n",
      "14. True\n",
      "15. True\n",
      "16. True\n",
      "17. False\n"
     ]
    }
   ],
   "source": [
    "s = 'hello,Python'\n",
    "\n",
    "# 判断字符串是否合法\n",
    "print('1.', s.isidentifier())  # 合法：字母数字下划线\n",
    "print('2.', 'hello'.isidentifier())\n",
    "print('3.', '张三_'.isidentifier())\n",
    "print('4.', '张三_123'.isidentifier())\n",
    "\n",
    "# 判断字符串是否全部由空白字符组成（回车、换行、水平制表符）\n",
    "print('5.', '\\t'.isspace())\n",
    "\n",
    "# 判断字符串是否全部由字母组成\n",
    "print('6.', 'abc'.isalpha())\n",
    "print('7.', '张三'.isalpha())\n",
    "print('8.', '张三1'.isalpha())\n",
    "\n",
    "# 判断字符串是否全部由十进制的数字组成\n",
    "print('9.', '123'.isdecimal())\n",
    "print('10.', '123四'.isdecimal())\n",
    "print('11.', 'ⅡⅢⅣ'.isdecimal())\n",
    "\n",
    "# 判断字符串是否全部由数字组成\n",
    "print('12.', '123'.isnumeric())\n",
    "print('13.', '123四'.isnumeric())  # T\n",
    "print('14.', 'ⅡⅢⅣ'.isnumeric())  # T\n",
    "\n",
    "# 判断字符串是否全部由字母和数字组成\n",
    "print('15.', 'abc1'.isalnum())\n",
    "print('16.', '张三123'.isalnum())  # T\n",
    "print('17.', 'abc!'.isalnum())  # F"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 7. 字符串的替换与合并"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "hello,Java\n",
      "hello,Java,Java,Python\n",
      "hello|Java|Python\n",
      "helloJavaPython\n",
      "helloJavaPython\n",
      "P*y*t*h*o*n\n"
     ]
    }
   ],
   "source": [
    "# 字符串的替换replace，第一个参数指定被替换的子串，第二个参数指定替换子串的字符串，该方法返回替换后得到的字符串，替换前字符串不发生变化，第三个参数指定最大替换次数\n",
    "s = 'hello,Python'\n",
    "print(s.replace('Python', 'Java'))\n",
    "s1 = 'hello,Python,Python,Python'\n",
    "print(s1.replace('Python', 'Java', 2))\n",
    "\n",
    "# 字符串的合并，将列表或元组中的字符串合并成一个字符串\n",
    "lst = ['hello', 'Java', 'Python']\n",
    "print('|'.join(lst))\n",
    "print(''.join(lst))\n",
    "\n",
    "t = ('hello', 'Java', 'Python')\n",
    "print(''.join(t))\n",
    "\n",
    "print('*'.join('Python'))  # P*y*t*h*o*n 将Python作为字符串序列去进行连接"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 8. 字符串的比较操作"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n",
      "97 98\n",
      "False\n",
      "21016\n",
      "a b\n",
      "刘\n",
      "True\n",
      "True\n",
      "True\n",
      "True\n",
      "2352475873832\n",
      "2352475873832\n",
      "2352475873832\n"
     ]
    }
   ],
   "source": [
    "# 从前往后比较\n",
    "print('apple' > 'app')\n",
    "\n",
    "print(ord('a'), ord('b'))\n",
    "print('apple' > 'banana')  # 相当于97>98，False\n",
    "print(ord('刘'))\n",
    "\n",
    "# ord相反操作chr，获取字符\n",
    "print(chr(97), chr(98))\n",
    "print(chr(21016))\n",
    "\n",
    "\"\"\"\n",
    "== 与 is 的区别\n",
    "== 比较的是值\n",
    "is 比较的是id是否相等\n",
    "\"\"\"\n",
    "a = b = 'Python'\n",
    "c = 'Python'\n",
    "print(a == b)\n",
    "print(b == c)\n",
    "print(a is b)\n",
    "print(b is c)\n",
    "print(id(a))\n",
    "print(id(b))\n",
    "print(id(c))  # 指向同一个内存空间，字符串的驻留机制"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 9. 字符串的切片操作"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "hello\n",
      "Python\n",
      "hello!Python\n",
      "2352520031472\n",
      "2352520391680\n",
      "2352520391120\n",
      "2352479202304\n",
      "2352520014832\n",
      "ello\n",
      "hloPto\n",
      "nohtyP,olleh\n",
      "Python\n"
     ]
    }
   ],
   "source": [
    "s = 'hello,Python'\n",
    "s1 = s[:5]  # 由于没有指定起始位置，从0开始切\n",
    "print(s1)\n",
    "s2 = s[6:]  # 指定起始位置未指定结束位置，切到结束\n",
    "print(s2)\n",
    "s3 = '!'\n",
    "new_str = s1 + s3 + s2\n",
    "print(new_str)\n",
    "\n",
    "# id都不同\n",
    "print(id(s))\n",
    "print(id(s1))\n",
    "print(id(s2))\n",
    "print(id(s3))\n",
    "print(id(new_str))\n",
    "\n",
    "# 完整写法：[start:end:step]\n",
    "print(s[1:5:1])  # 从1开始切到5，不包括5，步长为1\n",
    "print(s[::2])  # 0246，步长为间隔，默认从开始到结束，步长为2\n",
    "print(s[::-1])  # 倒置，默认从字符串的最后一个到第一个，因为步长为负数\n",
    "print(s[-6::1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 10. 格式化字符串"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "我叫张三，今年20岁\n",
      "我叫张三，今年20岁\n",
      "我叫张三，今年20岁\n",
      "99\n",
      "        99\n",
      "hellohello\n",
      "3.141593\n",
      "3.142\n",
      "     3.142\n",
      "hellohello\n",
      "3.1415926\n",
      "3.1415926\n",
      "3.14\n",
      "3.142\n",
      "3.142\n",
      "     3.142\n"
     ]
    }
   ],
   "source": [
    "# %占位符\n",
    "name = '张三'\n",
    "age = 20\n",
    "print('我叫%s，今年%d岁' % (name, age))\n",
    "\n",
    "# {}占位符\n",
    "print('我叫{0}，今年{1}岁'.format(name, age))\n",
    "\n",
    "# f-string\n",
    "print(f'我叫{name}，今年{age}岁')\n",
    "\n",
    "print('%d' % 99)\n",
    "print('%10d' % 99)  # 10为宽度\n",
    "print('hellohello')\n",
    "print('%f' % 3.1415926)\n",
    "print('%.3f' % 3.1415926)\n",
    "print('%10.3f' % 3.1415926)  # 同时表示宽度和精度，总宽度为10，小数点后3位\n",
    "print('hellohello')\n",
    "\n",
    "print('{}'.format(3.1415926))\n",
    "print('{0}'.format(3.1415926))  # 最好写上数字\n",
    "print('{0:.3}'.format(3.1415926))  # .3表示一共三位数字\n",
    "print('{0:.3f}'.format(3.1415926))  # .3f表示三位小数\n",
    "print('{:.3f}'.format(3.1415926))\n",
    "print('{:10.3f}'.format(3.1415926))  # 10表示宽度，.3f表示三位小数，同时表示宽度和精度"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 11. 字符串的编码与解码"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "b'\\xcc\\xec\\xd1\\xc4\\xb9\\xb2\\xb4\\xcb\\xca\\xb1'\n",
      "b'\\xe5\\xa4\\xa9\\xe6\\xb6\\xaf\\xe5\\x85\\xb1\\xe6\\xad\\xa4\\xe6\\x97\\xb6'\n",
      "天涯共此时\n",
      "天涯共此时\n"
     ]
    }
   ],
   "source": [
    "\"\"\"\n",
    "为什么需要字符串的编码转换：str在内存中以unicode表示，从A计算机到B计算机，需要用byte字节传输\n",
    "编码：将字符串转换为二进制数据（bytes）\n",
    "解码：将bytes类型的数据转换为字符串类型\n",
    "\"\"\"\n",
    "s = '天涯共此时'\n",
    "# 编码\n",
    "print(s.encode(encoding='GBK'))  # 在GBK这种编码格式中，一个中文占两个字节\n",
    "print(s.encode(encoding='UTF-8'))  # 在UTF-8这种编码格式中，一个中文占三个字节\n",
    "\n",
    "# 解码\n",
    "# byte代表的是一个二进制数据（字节类型的数据）\n",
    "byte = s.encode(encoding='GBK')\n",
    "print(byte.decode(encoding='GBK'))  # 编码解码格式必须相同\n",
    "byte1 = s.encode(encoding='UTF-8')\n",
    "print(byte1.decode(encoding='UTF-8'))\n",
    "\n",
    "# 主要在爬虫中应用"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.6.3",
   "language": "python",
   "name": "python3.6.3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "480px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
